The growing gap between data and users calls for innovative tools thataddress the challenges faced by big data volume, velocity and variety. Alongwith these standard three V's of big data, an emerging fourth "V" is veracity,which addresses the confidentiality, integrity, and availability of the data.Traditional cryptographic techniques that ensure the veracity of data can haveoverheads that are too large to apply to big data. This work introduces a newtechnique called Computing on Masked Data (CMD), which improves data veracityby allowing computations to be performed directly on masked data and ensuringthat only authorized recipients can unmask the data. Using the sparse linearalgebra of associative arrays, CMD can be performed with significantly lessoverhead than other approaches while still supporting a wide range of linearalgebraic operations on the masked data. Databases with strong support ofsparse operations, such as SciDB or Apache Accumulo, are ideally suited to thistechnique. Examples are shown for the application of CMD to a complex DNAmatching algorithm and to database operations over social media data.
展开▼